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Abstract 

This short note shows that all block relaxation algorithms can be formulated as 
majorization algorithms. The result is mostly a curiosity, without any obvious practical 
applications. 
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Note: This is a working paper which will be expanded/upclated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpdx/pubfolders/block has a pdf version, 
the complete Rmd hie, and the bib hie. 

1 Introduction 

We use notation and terminology taken from De Leeuw (1994). 

2 Block Relaxation 

To minimize g : X ® Y —> R over x e X and y e Y we can use the block relaxation algorithm. 


V {k+1) e argmin 1/ey ^(a; (fc) ,j/), 
x ( k+1 ) g argmin xeX g(x,y^ k+1 ^). 
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Note that the argmin’s are point-to-set maps, because the minima over blocks are not 
necessarily unique. 

As an example, consider g(a, b ) = SSQ (y — Xa — Zb) with SSQ() the sum of squares. The 
algorithm, using Moore-Penrose inverses, is 


b {k+i) = z + (y-Xa {k) ), 
a {k+i) = x + (y - Zb (k+1) ). 


3 Augmentation 

Suppose the original problem is to minimize / : X — > M over x E X and we can find 
g : X x Y —> R such that f(x) = min yeY g(x,y). Such a g is called an augmentation of /. 
Minimizing / over x E X can be done by applying block relaxation to the augmentation g 
over x E X and y E Y . 

In least squares factor analysis, for example, we minimize 

f(X) = SSQ(off (R - XX')), 
where off(A^) = X — diag(A^). Choose the augmentation 

g(X, A) = SSQ(i? - XX' - A) 

where A varies over diagonal matrices. The block relaxation algorithm is 


A (fc+1) = diag (R - X {k \X^)'), 

(.R - A (fc+1) )A (fc+1) = A (fc+1) A, 

where A is a symmetric matrix of Lagrange multipliers. Thus finding A^ fc+1) involves solving 
the eigen problem for R — A- fc+1 k 

4 Majorization 

Again we want to minimize / : X —> M over x E X. Suppose there is a g : X x X —y M such 
that g{x,y) > f(x) for all x E X and y E X and such that g{x,x) = f(x) for all x E X. 
Such a g is called a majorization of /. Minimize / over x E X by applying block relaxation 
to the majorization g over x E X and y E X. 

Clearly any majorization of / is also an augmentation of /. Majorization is a special type 
of augmentation because X = Y and x E argmin y& Y9{ x iV)- Thus the block relaxation is 
simply 

x^ :+1) G argmin a , gX 5 ((a:,a: (A ’ ) ). 

Thus majorization algorithms are block relaxation algorithms. 
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5 Majorization from Blocking 


Suppose h : X <g> Z —> M. Define T(x) = argmin ;JgZ /i(a:, z), and suppose t(x ) is a selection 
from T(x), i.e. t(x) £ T(x) for all x £ X. Define f(x) = h(x,t(x )) and g(x,y ) = h(x,t(y)). 
Then g{x,y) > g(x,x) = f(x). Thus g is a majorization of /. The majorization algorithm for 
/ and g is simply the block relaxation algorithm for h. Thus block relaxation algorithms are 
majorization algorithms. Onr reasoning here is very similar to Lange (2016) (section 4.9). 


As an example consider 
Then 

and the majorization of / is 


h(X, A) = SSQ(i? - XX' - A). 
f(X) = SSQ(off (R - XX')), 


g(X, Y ) = SSQ(i? - XX' - diag (R - YY')). 


Another example is 
Then 


h(a, b ) = SSQ (y — Xa — Zb). 


/(a) = (y- Xa)'(I - ZZ + )(y - Xa), 
and the majorization of / is 

g(a, b ) = SSQ(y - Xa - ZZ + (y - Xb)). 


Clearly we can also interchange the role of the two blocks. In the factor analysis example we 
can minimize out X to get 

n 

/(A)= £ 

s=p +1 

where the A S (A' _ ) are the ordered eigenvalues of X (assuming the p largest eigenvalues are 
non-negative). The majorization function is 

g( A, Q) = SSQ (R - A - (R - Q) p ), 

with (X) p the best rank p approximation of X. 


6 Partial Majorization 

Suppose the problem we want to solve is minimizing g(x,y) over x £ X and y £ Y. If 
both minimizing g(x,y) over x £ X for fixed y £ Y and minimizing g(x,y) over y £ Y 
for fixed x £ ATs easy, then we often use block-relaxation, alternating the two conditional 
minimization problems until convergence. 

But now suppose only one of the two problems, say minimizing g(x,y) over y £ Y for fixed 
x £ X, is easy. Define 

f(x) = min g(x,y) 

y&t 
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and let y(x)( be any y G Y such that f(x) = g(x,y(x)). 
Suppose we have a majorizing function h(x,z) for f(x). Thus 


f(x) < h(x, z ) \/x, z G X, 

f(x) = h(x,x) Vr 6 X. 

Suppose our curent best solution for x is x, with corresponding y = y{x). Let x + be any 
minimizer of h(x,x) over x G X. Now 

g(x + ,y(x + )) = f(x + ) < h(x + ,x ) < h{x,x) = f{x) = g{x,y{x)) 

which means that (x + ,y(x + )) gives a lower loss function value than (x,y(x)). Thus we have, 
under the usual conditions, a convergent algorithm. 
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